HyenaDNA | Open-Source StyleDrop | Data Poisoning

Update: 2023-07-05

Description

Welcome back to AI Daily! Today we discuss three great stories, starting with HyenaDNA. The application of the hyena model in DNA sequencing - enabling models to handle a million context length and revolutionizing our understanding of genomics. Secondly, we cover the exciting open-source implementation of StyleDrop - a tool that's making waves in the world of image editing and style replacement. Finally, we delve into the topic of data poisoning - how a small amount of injected data can drastically alter the outcome of an instruction tuning and the implications this has for AI security.

Key Points:

1️⃣ HyenaDNA

* HyenaDNA utilizes sub-quadratic scaling for DNA sequences, enabling a million context length, each a unique nucleotide, trained on 3 trillion tokens.

* HyenaDNA, setting a new state-of-the-art in genomics benchmarks, could predict gene expression changes, elucidating protein creation from genetic polymorphisms.

* It's 160 times faster than previous LLMs, fitting on a single CoLab, showcasing the potential to outperform transformers and attention models.

2️⃣ Open-Source StyleDrop

* An open-source version of Style Drop, an image editing and style replacing tool, has been implemented and made available for public use.

* Style Drop outperforms comparable models and offers comprehensive instructions for setup, allowing users to experiment with stylizing lettering and more.

* Following a pattern set by Dream Booth, Style Drop went from being a Google research paper to being implemented as an open-source project on GitHub.

3️⃣ Data Poisoning

* Two papers discuss data poisoning, a technique where information like ads or SEO can be injected into LLMs, impacting their responses and recommendations.

* Even a small number of examples in a dataset can effectively "poison" it, significantly altering the output of a language model during fine tuning.

* This technique is expected to be used with open-source datasets for fine-tuning, similar to how publishers put fake words in dictionaries to trace usage.

🔗 Episode Links

* HyenaDNA

* StyleDop

* Data Poisoning

* OpenAI

Follow us on Twitter:

* AI Daily

* Farb

* Ethan

* Conner

Subscribe to our Substack:

* Subscribe